Singapore's indicators in response to health outbreaks

Data analysis project report

1 Introduction, aims and data

1.1 Introduction

In recent years, the focus of the world has been heavily skewed toward healthcare, with the COVID-19 pandemic being the spark. All countries in the world have been affected to varying degrees and many factors of everyday life has not been the same since. Data analytics have been used widely in the global fight against COVID-19, for contract tracing and also to track the general trend of infections in a country. For instance, a 2021 study stated that analyzing health data in real-time with the utilization of AI techniques will have a vital role in predictive and preventive healthcare(Alsunaidi et al., 2021). In this report, I wish to explore two facets of a country that have been impacted by health outbreaks, namely tourism and the population death rate. In particular, visualising data plots to draw insights from observable trends. These trends can identify how well a country has been managed during difficult events. I have not found any other analysis online that uses a similar unique set of indicators to evaluate a country's response to health outbreaks.

I will focus on indicators from the country of Singapore over a 22 year period(2000-2021), during which the country faced Severe Acute Respiratory Syndrome (SARS) outbreak(2003-2004), Influenza A(H1N1) pandemic(2009) and the ongoing COVID-19 pandemic.

1.2 Aims and objectives

For this analysis, the aims are outlined below:

  1. Deciding on appropriate data sources and types such as in csv format
  1. Webscraping data from an appropriate source
  1. Clean and prepare/transform data for analysis work as well as identify key features of measure
  1. Perform exploratory data analysis to draw insights and conclusions from observable trends

The objectives are to measure impact of mass health outbreaks on the tourism and population death rate. Tourism and population are indicators that are directly affected when a health outbreak occurs. The borders of a country may be tightened or the international perception of a country may be hampered, lowering tourist arrival rates. Unfortunately, an outbreak might also cause an increase in mortality which can be identified via population death rate. Fluctuations in these two indicators are key evaluators of a country's incident response management. A few questions that can be asked during analysis are:

How has tourist arrivals to Singapore been affected during pandemic periods?

How did pandemic periods affect the population death rate?

Which age groups were most affected according to death rate?

Which pandemic can be observed to be the most severe?

1.3 Data relevance and justification

Data from WorldData(https://www.worlddata.info/) and Singapore Department of Statistics(DOS)(https://www.singstat.gov.sg/) were chosen as their time series covered time periods of the above-mentioned pandemics. WorldData will provide for the tourism indicator while DOS will provide for population death rate indicator. In case if data from WorldData requires supplementing, DOS will be used as provision. Reliability and accuracy of data is ensured as WorldData states that the data was based on information from United Nations World Tourism Organisation(UNWTO) while DOS is the official governmental statistics department of Singapore. WorldData will be webscraped and data will be taken from DOS in the form of csv format.

As a comparison, Kaggle(https://www.kaggle.com/) was also considered as a data source. However, as data can be uploaded by anyone on that platform, it may have a lack of transparency on the sources and the data trail might not be verifiable. Thus data from Kaggle might not have the same accuracy as WorldData and Singapore Department of Statistics.

1.4 Data limitations and constraints

There is no tourism data for the year of 2021 from WorldData currently, although COVID-19 pandemic has stretched well into 2021. A solution would be to supplement with data from DOS.

Although 3 pandemics in the span of 22 years is epidemiologically significant, the limited occurences in the context of data points may restrict the richness of insights that can be gathered. More limitations and constraints will be elaborated on during the analysis.

1.5 Ethical consideration

If you need this data for school or university, but do not earn money and do not spread it otherwise, you are welcome to use them.

DOS allows non-exclusive use of the data as long as the source is credited to them. Here is the clause on conditions of use from the site:

Subject to these Terms of Use, we grant you free, worldwide, perpetual and non-exclusive use of the Contents made available on this Website for the purpose of (a) copying, distribution or transmission of the Contents and (b) using the Contents to develop or derive, for sale or otherwise, any products and services or to resell the Contents in any form to any Third Party, provided that you: a. credit the source of the Contents;

b. use the Contents in a way that is legal and in accordance with all applicable laws;

c. ensure that no analysis or transformation of the Contents may be presented in a manner which suggests or is likely to lead to the belief that the analysis or transformation of the Contents is attributed to us;

d. cease to use the Contents and remove them from your applications or websites upon our request in the event that the Contents are no longer provided on this Website or of a breach of any of these Terms of Use;

e. ensure the datasets and data in the Contents are accurately reproduced; and

f. do not use the Contents in a way that suggests we are associated to you or we endorse you or your use of the Contents.

1.6 Potentially problematic aspects

The data used in this report have been assessed to have minimal harm(if any) on individuals and organisations. This is due to a lack of individual identification and comes from governmental data sources. The analysis might have the potential to form new intellectual property(IP) based on the findings that can be considered as a basic measure of a country's perfomance in response to health outbreaks. However, I would like to state that any findings from this report are not new IP and are based on specific transformation of data for the purpose of this report.

This report strives to be factual and objective on the above stated indicators of Singapore during health outbreaks. The analysis will not claim to be an ultimate representation of Singapore's performance on pandemic management. Conclusions drawn will be neutral and report objectively based on the data findings.

1.7 Onward use of data and findings

If anyone wishes to use the data sources in this report, they have to separately abide by the clauses that govern these respective data sources. Use of data transformation or findings from this report will also be governed by the same clauses from the original sources. The transformation and analysis of the data are done by me and in no way attributed to the original data sources.

2 Data Preparation

2.1 Data import, web scraping and cleaning

Here, data will be scraped from WorldData pertaining to tourist arrivals in Singapore. The below code scrapes the data in a few seconds:

Save the scraped data into a csv file for later use:

Import and clean previously scraped data from csv file and focus only on tourist arrivals by year, also removing alphabets:

As the data is currently combined, separate years and arrival numbers and convert to dataframe. Check for out of bounds values:

Population death rate was taken from DOS website(https://tablebuilder.singstat.gov.sg/table/TS/M810141) and downloaded as a csv file. The data contains filters by age groups and also the overall death rate.

Import data from csv file and convert to dataframe.

Check for out of bounds values:

2.2 Defining features and data modification

Now our main data has been imported and fitted into dataframes, we can do a processing to define the final dataframes for visualisation. The dataframes can also be restricted to appropriate time periods. The indicators that were discussed earlier for the analysis were tourism and population death rates. Features have to be defined in order to proceed with visualisation and analysis. As the death rate dataset consists of overall rate and segregation by age groups, the data can be split into two features. The final key features to visualise trends will be:

Tourist arrival rates

Overall population death rate

Death rate by age groups

The defined time period will be year 2000 to year 2021. As the tourist arrivals data only contains up to year 2020, data for 2021 has to be supplemented from DOS(https://tablebuilder.singstat.gov.sg/table/TS/M550001) as a downloaded csv file. However, the supplemented data is in monthly format, thus some transformations have to be done to combine the data for a single year.

Restrict data to months of 2021 and total the numbers:

Add the value for year 2021 into arrivals dataframe and restrict year range to 2000-2021:

Overall population death rate has to be extracted from the existing death_rate dataframe, processed and restricted to year range 2000-2001:

Remove first row from death_rate dataframe to have the age groups remaining, process and restrict to year range 2000-2001:

3 Analysis and conclusions

3.1 Exploratory data analysis

Now 3 dataframes have been prepared for visualisation. Each dataframe will be plotted and analysed at 3 time periods, 2003-2004 for SARS, 2009 for H1N1 and 2019 onwards for COVID-19. This section will strive to answer the questions set out in the aims and objectives while elaborating on limitations and constraints.

Below is the histogram plot for tourist arrival rates:

From 2003-2004(SARS) it seems that tourist arrivals have dipped in 2003 compared to previous years. By 2004 the trend had recovered back to normality, even outperforming the years before 2003. In 2009(H1N1) there was only a slight dip. The most significant change comes after the year that COVID-19 started, with a 6 fold decrease in tourist arrivals in 2020. An even steeper decrease in 2021 reduced the number to the lowest in 22 years. The combined number of 2020 and 2021 were lower than any other single year.

How has tourist arrivals to Singapore been affected during pandemic periods?

Based on the above observations, all three pandemics impacted tourist arrivals negatively in varying levels. 2009(H1N1) has least impact while 2019 onwards(COVID-19) was the hardest hitting.

Below is the histogram plot for overall population death rate:

2003(SARS) had a slight increase in death rate but dropped to normal by 2004. Surprisingly, 2009(H1N1) had a slight dip in rate despite being a pandemic year. For 2019 onwards(COVID-19), 2020 had a slight increase while 2021 surpassed all the previous years with a large jump in rate.

How did pandemic periods affect the population death rate?

Based on the above observations, all three pandemics had an impact on the population death rate. 2003(SARS) and 2019 onwards(COVID-19) was impacted positively, with COVID-19 having the sharpest hike. 2009(H1N1) had a slight negative impact on the trend.

Below is the graph plot for death rate by age groups:

For this graph, trace indicates the year(etc trace 21 is the year 2021). An exponential trend can be observed across the age groups, corresponding to the maturity of age. There are no observable outliers. For 2003-2004(SARS) and 2009(H1N1) there seems to be little variation from the surrounding years. 2019 onwards(COVID-19) there is an increase year on year from 2020 to 2021. 2021 holds the highest death rate across all age groups in the entire 22 year period.

Which age groups were most affected according to death rate?

The general trend from the above observations was the oldest age group had the highest death rate. This is expected as due to natural causes, people generally have weaker immune systems as they age.

Which pandemic can be observed to be the most severe?

COVID-19 was observed to have the largest impact across every indicator, outclassing SARS and H1N1 quantitatively.

Some limitations and constraints

While death rate may be a general indicator of how well a pandemic has been managed, there can be complex underlying factors that may also influence the death rate. Some factors may include the infectious and mortality nature of the disease and the resources of the healthcare system. Such influences may limit the relationship in which insights can be drawn from the death rate.

3.2 Conclusions and summary

From the analysis and findings, SARS and H1N1 had a moderate to minimal effect on Singapore's tourism and death rate. Singapore seems to have responded to both pandemics relatively well, quickly recovering in the subsequent year. COVID-19 seems to be an outlier compared to the other pandemics, severely impacting tourism and death rate. However, as medical science has continually progressed in the two decades, it may not be the quality of response from Singapore to COVID-19 but rather the nature of disease. A 2021 study had remarked that Singapore demonstrated a strong capacity to identify, trace and document COVID-19 cases(Menkir et al., 2021).

References

Alsunaidi, S., Almuhaideb, A., Ibrahim, N., Shaikh, F., Alqudaihi, K., Alhaidari, F., Khan, I., Aslam, N. and Alshahrani, M., 2021. Applications of Big Data Analytics to Control COVID-19 Pandemic. Sensors, 21(7), p.2282.

Menkir, T., Chin, T., Hay, J., Surface, E., De Salazar, P., Buckee, C., Watts, A., Khan, K., Sherbo, R., Yan, A., Mina, M., Lipsitch, M. and Niehus, R., 2021. Estimating internationally imported cases during the early COVID-19 pandemic. Nature Communications, 12(1).